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* ABSTRACT 

Search engines are among the most useful and high-profile resources on the Internet. The problem of 
finding information on the Internet has been replaced with the problem of knowing where search 
engines are, what they are designed to retrieve, and how to use them. This article describes and 
evaluates SavvySearch, a metasearch engine designed to intelligently select and interface with 
multiple remote search engines. The primary metasearch issue examined is the importance of 
carefully selecting and ranking remote search engines for user queries. We studied the efficacy of 
SavvySearch's incrementally acquired metaindex approach to selecting search engines by analyzing 
the effect of time and experience on performance. We also compared the metaindex approach to the 
simpler categorical approach and showed how much experience is required to surpass the simple 
scheme. 
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A solid background in the concept of Web search engines. and metasearch engines, and some 
experiments on SavvySearch, a metasearch engine designed by the authors, are provided in this 
paper. It includes easy-to-follow definitions of concepts necessary to an understanding of 
information retrieval, Web search engines, and metasearch engines. It is nice to see that concepts 
such as search engines (designed to aid in finding Web sites, given the exponentially growing 
number of sites) and metasearch engines (designed to aid in deciding which search engines to use, 
given the rapid growth in search engines), which have been known for years by library and 
information scientists, are being rediscovered by computer scientists. The authors note that a 
metasearch engine must have a dispatch mechanism to determine which search engines to employ, 
an interface agent to adapt a user query into a query suitable for each search engine employed, and 
a display mechanism by which to return the search results to the user. The paper provides a good 
literature search of available metasearch engines along with their Web site URLs. The authors 
explain how SavvySearch uses the keywords in the user's query to rank potential search engines 
that will eventually rank Web sites deemed relevant to the query. They note that the top search 
engines can be made to search in parallel. In order to rank search engines, they keep track of term 
frequencies at the sites searched by each search engine, and they keep track of the frequencies of 
success and failure of each search engine in terms of finding relevant sites for specific terms. The 
ranking of the search engines is accomplished by a complex formula based on concepts analogous to 
ranking via term weights in standard document retrieval. The ranking includes considerations of 
concurrency, expected network load, and local CPU load. One nice feature of the search engine 
ranking mechanism is the inclusion of thresholds for response times, leading to penalties for slow 
searches. The paper provides the results of a series of experiments with SavvySearch. A pilot study 
looked at how well search engines were being selected. The authors used a large set of queries (at 
least 2500). They varied the ordering of the search engines and the selection of the first group of 
search engines to be employed. Results indicate that their approach is viable, that users like the 
basic approach, that users follow more links found at the beginning of a search, and that past query 
success can be used to improve future searches. Further experiments looked at SavvySearch 
enhancements, such as penalties for lack of results and frequent updating of the meta-index, which 
is the data structure for information about search engine successes and failures and for term 
frequencies. Results were mixed, but, in general, SavvySearch's approach is a good one. The bottom 
line is that SavvySearch has garnered increased interest and use. It takes some experience for the 
system to learn enough about what is out there to improve on categorical searches done by other 
means. The approach is especially effective at figuring out where not to search. The authors continue 
to search for more efficient ways to use the Web to find relevant information. Online Co m puting 
Revi ews Service 
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Search engines are among the most useful and high-profile resources on the Internet. The 
problem of finding information on the Internet has been replaced with the problem of 
knowing where search engines are, what they are designed to retrieve, and how to use 
them. This article describes and evaluates SavvySearch, a metasearch engine designed to 
intelligently select and interface with multiple remote search engines. The primary 
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context. We propose a personalized search approach that can easily extend a conventional 
search engine on the client side. Our mapping framework automatically maps a set of 
known user interests onto a group of categories in the Open Directory Project (ODP) and 
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The explosive growth of available information sources and the resulting information 
overload pose several problems for users in many business organizations and educational 
institutions. First, searching through several information sources, one at a time, is a 
source of enormous frustration for users. Second, top-ranked documents in search results 
are frequently irrelevant to what users are interested in. To address these problems, we 
have developed ixmeta™, a powerful metasearch engine tha ... 
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The proliferation of online information resources increases the importance of effective and 
efficient information retrieval in a multicollection environment. Multicollection searching is 
cast in three parts: collection selection (also referred to as database selection), query 
processing and results merging. In this work, we focus our attention on the evaluation of 
the first step, collection selection. In this article, we present a detailed discussion of the 
methodology that we used to evaluate an ... 
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Web search engines work well for finding crawlable pages, but not for finding datasets 
hidden behind Web search forms. We describe a novel technique for detecting search 
forms, which could be the basis for a next-generation distributed search application. We 
use automatic feature generation to describe candidate forms and C4.5 decision trees to 
classify them. In two testbeds, we get an accuracy of more than 85% and a precision of 
more than .87%. One of our decision trees is effective on both test ... 
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SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines 
two complementary existing protocols, SDLIP and STARTS, to define a uniform interface 
that collections should support for searching and exporting metasearch-related metadata. 
SDARTS also includes a toolkit with wrappers that are easily customized to make both 
local and remote document collections SDARTS-compliant. This paper describes two 
significant ways in which we have extended the SDARTS toolkit. First, we ... 
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Recent increase in the number of search engines on the Web and the availability of meta 
search engines that can query multiple search engines makes it important to find effective 
methods for combining results coming from different sources. In this paper we introduce 
novel methods for reranking in a meta search environment based on expert agreement 
and contents of the snippets. We also introduce an objective way of evaluating different 
methods for ranking search results that is based upon implici ... 
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Internet search engines and comparison shopping have recently begun implementing a 
paid placement strategy, where some content providers are given prominent positioning in 
return for a placement fee. This bias generates placement revenues but creates a disutility 
to users, thus reducing user-based revenues. We formulate the search engine design 
problem as a tradeoff between these two types of revenues. We demonstrate that the 
optimal placement strategy depends on the relative benefits (to provid ... 
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terms 

Frequently a user's information needs are stored in the databases of multiple search 
engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search 
engines and identify useful documents from the returned results. To support unified access 
to multiple search engines, a metasearch engine can be constructed. When a metasearch 
engine receives a query from a user, it invokes the underlying search engines to retrieve 
useful information for the user. Metasearch engines have ... 

Keywords: Collection fusion, distributed collection, distributed information retrieval, 
information resource discovery, metasearch 



10 A highl y scalable and effective method for metasearch 
Weiyi Meng, Zonghuan Wu, Clement Yu, Zhuogang Li 

July 2001 ACM Transactions on Information Systems (TOIS), volume 19 issue 3 

Publisher: ACM Press ■ 

Full text available* -Pi Ddf(653 63 KB) Additiona l Information: full citation, abstrac t, references , citings, in dex 
. THJJ-a = 1 t erms 

A metasearch engine is a system that supports unified access to multiple local search 
engines. Database selection is one of the main challenges in building a large-scale 
metasearch engine. The problem is to efficiently and accurately determine a small number 
of potentially useful local search engines to invoke for each user query. In order to enable 
accurate selection, metadata that reflect the contents of each search engine need to be 
collected and used. This article proposes a highly scalable ... 
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User profiles are the central component of most personalized Web information agents. 
They consist of a set of models representing the various topics of interest to the user. 
Often the agent learns the user's preferences from examples of documents deemed 
relevant to the user. The topic of the document can either be supplied by the user (active 
modeling), or it must be guessed by the agent (passive modeling), which is more 
convenient but is expected to diminish the agent's accuracy. We presen ... 
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A well-known problem for web search is targeting search on information that satisfies 
users' information needs. User queries tend to be short, and hence often ambiguous, 
which can lead to inappropriate results from general-purpose search engines. This has led 
to a number of methods for narrowing queries by adding information. This paper presents 
an alternative approach that aims to improve query results by using knowledge of a user's 
current activities to select search engines relevant t ... 
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Significant efforts are being made to digitize rare and valuable library materials, with the 
goal of providing patrons and historians digital facsimiles that capture the "look and feel" 
of the original materials. This is often done by digitally photographing the materials and 
making high resolution 2D images available. The underlying assumption is that the objects 
are flat. However, older materials may not be flat in practice, being warped and crinkled 
due to decay, neg ... 

Keywords: World Wide Web, distributed information retrieval, effectiveness evaluation, 
server selection 



15 Personalization and retrieval: Capturin g community search ex pertise for pe rson aliz e d Q 
^ web search using snippet-indexes 
^ Oisin Boydell, Barry Smyth 

November 2006 Proceedings of the 15th ACM international conference on Information 
and knowledge management CIKM '06 

Publisher: ACM Press 

Full text available: Q pdf(288.43 KB) Additional Information: full citation , abstract , references , index terms 

We describe and evaluate an approach to capturing and re-using search expertise within a 
community of like minded searchers, such as the employees of a company or organisation. 
Within knowledge based industries, search expertise - the ability to quickly and accurately 
locate information according to a specific information need - is an important corporate 
asset and in our approach we attempt to capture this knowledge by mining the title and 
snippet texts of results that have been selected by comm ... 
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With the explosive growth of the World Wide Web, the public is gaining access to massive 
amounts of information. However, locating needed and relevant information remains a 
difficult task, whether the information is textual or visual. Text search engines have 
existed for some years now and have achieved a certain degree of success. However, 
despite the large number of images available on the Web, image search engines are still 
rare. In this article, we show that in order to allow people to profi ... 
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